Skip to content

[recipes] Fix UUID cursor in fingerprint dedup backfill#152

Open
demarant wants to merge 2 commits intoNateBJones-Projects:mainfrom
demarant:fix/fingerprint-dedup-backfill-uuid-cursor
Open

[recipes] Fix UUID cursor in fingerprint dedup backfill#152
demarant wants to merge 2 commits intoNateBJones-Projects:mainfrom
demarant:fix/fingerprint-dedup-backfill-uuid-cursor

Conversation

@demarant
Copy link
Copy Markdown
Contributor

@demarant demarant commented Apr 3, 2026

Contribution Type

  • Recipe (/recipes)
  • Schema (/schemas)
  • Dashboard (/dashboards)
  • Integration (/integrations)
  • Skill (/skills)
  • Repo improvement (docs, CI, templates)

What does this do?

Fixes the fingerprint dedup backfill scripts to work with UUID primary keys on the thoughts table. Changed cursor pagination from id (integer) to created_at (timestamp), which works for any ID type and resolves the error invalid input syntax for type uuid: "0".

Requirements

Checklist

  • I've read CONTRIBUTING.md
  • My contribution has a README.md with prerequisites, step-by-step instructions, and expected outcome
  • My metadata.json has all required fields
  • If my contribution depends on a skill or primitive, I declared it in metadata.json and linked it in the README — depends on content-fingerprint-dedup primitive (declared in existing README)
  • I tested this on my own Open Brain instance — 1815 rows backfilled successfully, 10 duplicates skipped, 0 errors
  • No credentials, API keys, or secrets are included

Test Results

$ node backfill-fingerprints.mjs

=== Backfill content_fingerprint ===
Resuming from cursor created_at=1970-01-01T00:00:00Z (0 already done)
Batch size: 1000

Batch 1: fetching from created_at>1970-01-01T00:00:00Z… 1000 rows. Patching…
  → 1000 patched. Total: 1000 patched, 0 duplicates, 0 errors. Cursor: 2025-04-12T13:52:40+00:00
Batch 2: fetching from created_at>2025-04-12T13:52:40+00:00… 825 rows. Patching…
  → 815 patched, 10 duplicates (skipped). Total: 1815 patched, 10 duplicates, 0 errors. Cursor: 2026-04-03T07:41:36.188656+00:00
Batch 3: fetching from created_at>2026-04-03T07:41:36.188656+00:00… (no rows) — Done.

=== COMPLETE ===
Total rows backfilled   : 1815
Total duplicate skipped : 10
Total other errors      : 0
State file cleaned up.

Outcome: ✅ Fix verified working on real Open Brain database with UUID primary key.

- Use created_at instead of id for cursor pagination
- Fixes error when thoughts table has UUID primary key
- Tested on real Open Brain DB with 1815 rows backfilled
@github-actions github-actions bot added the recipe Contribution: step-by-step recipe label Apr 3, 2026
@justfinethanku
Copy link
Copy Markdown
Collaborator

Thanks for the fix attempt here. I’m not merging it as-is, but this one has a clear path forward.

What needs to change:

  • use a stable pagination strategy that won’t skip rows with identical timestamps
  • the simplest fix is a composite cursor, for example created_at + id, rather than created_at alone

If you want to update it in that direction, we’d be happy to take another look.

@justfinethanku
Copy link
Copy Markdown
Collaborator

Keeping this open because it’s a useful fix, but it needs one correctness change before we can merge it.

Recommended next step:

  • replace the timestamp-only cursor with a stable composite cursor, for example created_at + id, so rows with identical timestamps cannot be skipped

Owner ask:

  • please update the pagination logic in both scripts and include a short note in the PR body describing how the revised cursor avoids tied-timestamp gaps

That should be enough for a fast re-review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

recipe Contribution: step-by-step recipe

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants